--- %%NOBANNER%% -->
/*------------------<--- Start of Description -->--------------------\
| This macro analyzes case-control studies where cases and controls |
| are matched. It is a replacement for the SAS Version 5 procedure |
| PROC MCSTRAT. |
|--------------------<--- End of Description -->---------------------|
|--------------------------------------------------------------------|
|--------------<--- Start of Files or Arguments Needed -->-----------|
| The following keyword macro parameters are used: |
| * DATA Name of input dataset. |
| ID Requests that a list of sets not included in |
| the model be printed. Values are YES and NO(default) |
| COV Requests that the covariance matrix be printed. |
| Values are YES and NO(default). |
| OUTDATA Names a SAS dataset to be created which contains |
| all observations used in the model. |
| MAXITER Specifies the maximum number of iterations to be |
| performed. Default is 10. |
| EPSILON Specifies the difference in log likelihood used to |
| determine convergence. Default is .000001. |
| UNI Requests that univariate statistics be printed for |
| independent variables. Values are YES(default) and NO |
| MINCNTL Specifies the minimum number of controls required in |
| each matched set. Default is 1. |
| MINCASE Specifies the minimum number of cases required in |
| each matched set. Default is 1. |
| * SETID Name of variable which identifies matched sets in |
| the input dataset. Default is SETID. |
| * CASE Name of the case-control indicator variable. Values |
| must be 1 for cases and 0 for controls. |
| * INDVAR Names of independent variables separated by blanks. |
| ALL VARIABLE NAMES SHOULD BE 7 CHARACTERS OR LESS |
| TABLES List of independent variables for which you want |
| frequency tables. Separate variable names with |
| blanks. The variables should be 1/0 or 1/2 |
| indicators. The tables show how many cases and |
| controls had a 1 in each set. |
| DIAG Requests that output data sets containing regression |
| diagnostics be created. Values are YES(default) |
| and NO. Two data sets created: one on an |
| individual / subject level (SUBDIAG) and one on a |
| matched set level (SETDIAG). |
| |
| * Required parameters: user must supply a value. |
| |
| |
| Contents of data sets created by specifying DIAG=YES: |
| SUBDIAG contains regression diagnostics on an individual |
| level. Number of observations is equal to the number |
| of observations used to fit the logistic model. |
| all independent variables in the logistic model |
| (specified in the INDVAR parameter) |
| the case variable (specified in the CASE parameter) |
| the set id variable(specified in the SETID parameter) |
| XI -> model fitted values (interpreted as the |
| probability the individual is a case)* |
| DELTAX2 -> delta chi-square statistic assessing effect|
| of observations on overall model fit* |
| INFL -> influence statistic assessing effect of |
| observation on overall model fit* |
| HAT -> leverage value* |
| LD -> likelihood displacement statistic assessing|
| effect of observation on overal model fit**|
| LMAX -> LMAX statistic assessing effect of |
| observation on overall model fit** |
| D(var) -> delta-beta (DFBETA) statistic assessing |
| effect of observation on a particular |
| covariate in the model. (var) corresponds |
| to first seven characters in the variable |
| name for that particular covariate. One |
| diagnostic for each variable in the model**|
| S(var) -> scaled DFBETA statistics, created by |
| dividing original DFBETA by coefficient |
| standard error from the estimated |
| covariance matrix** |
| |
| SETDIAG contains sums of values from SUBDIAG data set |
| (summed over all observations in matched set). Number|
| of observations is equal to the number of matched sets|
| used to fit logistic model. |
| the setid variable (specified in the SETID parameter) |
| DELTAX2 -> sum of DELTAX2 diagnostics from SUBDIAG |
| data set |
| INFL -> sum of INFL diagnostics from SUBDIAG |
| data set |
| HAT -> sum of leverage diagnostics from SUBDIAG |
| data set |
| LD -> sum of LD diagnostics from SUBDIAG |
| data set |
| LMAX -> sum of LMAX diagnostics from SUBDIAG |
| data set |
| D(var) -> sum of DFBETA diagnostics for a |
| particular covariate from SUBDIAG data |
| set. One diagnostic for each variable in |
| the model. |
|---------------<--- End of Files or Arguments Needed -->------------|
|--------------------------------------------------------------------|
|----------------<--- Start of Example and Usage -->-----------------|
| Example: |
| Low birth weight data set (LOWWGT). Cases are mothers of |
| low birth weight babies, controls are age-matched mothers of |
| normal birth weight babies. Three controls matched to each case.|
| Variable CASE distinguishes cases from controls (cases have |
| value of 1, controls 0). Independent variables are SMOKE, UI, |
| PTD, LWD. Set id variable is SET. User wants descriptive |
| statistics to be printed for each independent variable. User |
| wants data sets of diagnostics to be created. User wants |
| frequency tables for variables SMOKE and UI. Macro call is as |
| follows... |
| %mcstrat(data=lowwgt,setid=set,indvar=smoke ui ptd lwd, |
| case=case, uni=yes, diag=yes, tables=smoke ui) |
| |
| The following dataset names are reserved and should not be used |
| by the calling program: |
| __badset __caco __dat __dat1 __out1 __out2 __out3 |
| __out4 __out5 __tab __xbeta __hat |
| subdiag setdiag |
| |
| The following variable names are reserved and should not be used |
| by the calling program: |
| __case1 __cntl1 __del __df1-__dfn (n=# of indep. variables) |
| __sco1-__scon __sch1-__schn |
| __name __ncase __ncntl __nobs __nsets __ratio __tcase |
| __tcntl __time __tused n_case n_cntl __sumr __suml __sumld|
| __sumi __sumh __sumd1-__sumdn __sums1-__sumsn lmax ld deltax2 |
| infl hat xi |
| any name which would be the first 7 characters of an independent|
| variable name prefixed with the letter s |
| any name which would be the first 7 characters of an independent|
| variable name prefixed with the letter d |
| Usage: %mcstrat(data=,id=no,cov=no,outdata=,maxiter=10, |
| epsilon=.000001,uni=yes,mincntl=1,mincase=1, |
| setid=setid,case=,indvar=,tables=,diag=yes); |
| Reference: |
| * for more information see Hosmer DW and Lemeshow S. Applied |
| Logistic Regression. New York: John Wiley and Sons, Inc., 1989.|
| ** for more information see SAS Institute Inc. SAS/STAT Software: |
| Changes and Enhancements through Release 6.12. Cary, NC: SAS |
| Institute, Inc., 1997, pp. 895-900 (the PHREG procedure). |
\-------------------<--- End of Example and Usage -->---------------*/
%macro mcstrat(data=,id=no,cov=no,outdata=,maxiter=10,
epsilon=.000001,uni=yes,mincntl=1,mincase=1,
setid=setid,case=,indvar=,tables=,diag=yes);
/*--------------------------------------------\
| Author: Rob Vierkant and Jon Kosanke; |
| Created: December 6, 1999; |
| Purpose: Analyzes case-control studies where|
| cases and controls are matched. It|
| is a replacement for SAS Version 5 |
| procedure PROC MCSTRAT. |
\--------------------------------------------*/
%****local macro parameters;
%local i t ntab err nobsorig nobsused setsorig setsused ls
name newname tabvar;
%****adjusting macro parameters and checking for errors;
%let id=%upcase(&id);
%let cov=%upcase(&cov);
%let uni=%upcase(&uni);
%let setid=%upcase(&setid);
%let case=%upcase(&case);
%let indvar=%upcase(&indvar);
%let tables=%upcase(&tables);
%let diag=%upcase(&diag);
%let err=0;
%if %length(&data)=0 %then %do;
%put 'ERROR: NO INPUT DATASET SUPPLIED';
%let err=1;
%end;
%if &id ^= NO & &id ^= N & &id ^= YES & &id ^= Y %then %do;
%put 'ERROR: ID MUST BE YES OR NO';
%let err=1;
%end;
%if &cov ^= NO & &cov ^= N & &cov ^= YES & &cov ^= Y %then %do;
%put 'ERROR: COV MUST BE YES OR NO';
%let err=1;
%end;
%if &uni ^= NO & &uni ^= N & &uni ^= YES & &uni ^= Y %then %do;
%put 'ERROR: UNI MUST BE YES OR NO';
%let err=1;
%end;
%if &diag ^= NO & &diag ^= N & &diag ^= YES & &diag ^= Y %then %do;
%put 'ERROR: DIAG MUST BE YES OR NO';
%let err=1;
%end;
%if %length(&case)=0 %then %do;
%put 'ERROR: NO CASE VARIABLE SUPPLIED';
%let err=1;
%end;
%if %length(&indvar)=0 %then %do;
%put 'ERROR: NO INDEPENDENT VARIABLES SUPPLIED';
%let err=1;
%end;
%if %length(&setid)=0 %then %do;
%put 'ERROR: NO SETID VARIABLE SUPPLIED';
%let err=1;
%end;
%****small macro used later on in the program;
%macro var;
%do i=1 %to &numvars;
&&var&i
%end;
%mend var;
%****count independent vars and set up macro variables
var1,var2,var3...;
%let numvars=0;
%do %until(%scan(&indvar,&numvars+1,' ')= );
%let numvars=%eval(&numvars+1);
%let var&numvars=%scan(&indvar,&numvars,' ');
%local var&numvars varb&numvars;
%end;
%****create 7 character variable used to later name individual
delta-betas and scaled delta-betas;
%do i=1 %to &numvars;
%if %length(&&var&i)=8 %then %do;
%let varb&i=%substr(&&var&i,1,7);
%end;
%else %let varb&i=&&var&i;
%end;
****determines title number to use in output;
proc sql;
reset noprint;
select max(number) into:t from dictionary.titles where type='T';
quit;
%let t=%eval(&t+2);
%if %eval(&t)>10 %then %let t=10;
%****if no errors then produce output;
%if &err=0 %then %do;
%do i=1 %to &numvars;
%local se&i;
%end;
%let ntab=0;
%do %until(%scan(&tables,&ntab+1,' ')= );
%let ntab=%eval(&ntab+1);
%end;
****format for cases and controls;
proc format;
value x 0='Control'
1='Case';
****original data set;
data __dat1; set &data;
proc sort; by &setid &case;
****macro parameters for # of original observations and sets;
data _null_; set __dat1 end=eof; by &setid;
__nobs+1;
if first.&setid then __nsets+1;
if eof;
call symput('nobsorig',trim(left(put(__nobs,6.))));
call symput('setsorig',trim(left(put(__nsets,6.))));
****creating time variable, deleting observations with
missing values;
data __dat1; set __dat1;
__time=1;
%do i=1 %to &numvars;
if &&var&i=. then delete;
%end;
****finding numbers of cases and controls in each matched set,
and determining if each least as many as required;
data __badset; set __dat1; by &setid &case;
keep &setid;
retain __ncntl __ncase;
if first.&setid then do;
__ncntl=0;
__ncase=0;
end;
__del=0;
if &case ^in (0 1) then __del=1;
if __del=0 & &case=0 then __ncntl+1;
if __del=0 & &case=1 then __ncase+1;
if last.&setid then do;
if __ncntl<&mincntl | __ncase<&mincase then output;
end;
****merging numbers of cases and controls back in with original
data, and deleting sets without required number of cases
and controls;
data __dat;
merge __dat1 __badset(in=inb); by &setid;
if inb then delete;
****creates data set that includes all observations used in
analysis (if data set name for outdata is specified);
%if &outdata^= %then %do;
data &outdata; set __dat;
drop __time;
%end;
****determines current linesize option;
proc sql;
reset noprint;
select setting into:ls from dictionary.options
where optname='LINESIZE';
****macro parameters of number of observations and sets used
in analysis;
data _null_; set __dat end=eof; by &setid;
__nobs+1;
if first.&setid then __nsets+1;
if eof;
call symput('nobsused',trim(left(put(__nobs,6.))));
call symput('setsused',trim(left(put(__nsets,6.))));
****summarizing number of cases and controls in each set to
be placed in table later;
proc freq data=__dat noprint;
tables &setid*&case/out=__out1;
data __caco; set __out1 end=eof; by &setid;
keep &setid __ncase __ncntl;
retain __ncase __ncntl;
if first.&setid then do;
__ncase=.;
__ncntl=.;
end;
if &case=0 then __ncntl=count;
if &case=1 then __ncase=count;
if last.&setid;
****printing table describing matched sets;
proc freq noprint;
tables __ncase*__ncntl/out=__out2;
data _null_; set __out2 end=eof;
file print;
if _n_=1 then do;
put @((&ls-60)/2)
'MCSTRAT: LINEAR LOGISTIC REGRESSION ANALYSIS FOR MATCHED SETS';
put @((&ls-60)/2) 60*'=';
put / @((&ls-14)/2)
"SETID = &setid";
put @((&ls-30)/2)
"CASE/CONTROL INDICATOR = &case";
put // @((&ls-30)/2)
"# OF OBSERVATIONS READ = &nobsorig";
put @((&ls-30)/2)
"# OF OBSERVATIONS USED = &nobsused";
put @((&ls-30)/2)
"# OF MATCHED SETS READ = &setsorig";
put @((&ls-30)/2)
"# OF MATCHED SETS USED = &setsused";
put /// @((&ls-32)/2)
'SUMMARY OF MATCHED SETS ANALYZED';
put @((&ls-32)/2) 32*'=';
put / @((&ls-38)/2)
'# CASES # CONTROLS # MATCHED SETS';
put @((&ls-38)/2)
'======= ========== ==============';
end;
put @((&ls-38)/2)
__ncase 7. +6 __ncntl 7. +10 count 7.;
__tcase+__ncase*count;
__tcntl+__ncntl*count;
__tused+count;
if eof then do;
put @((&ls-38)/2)
'=====================================';
put @((&ls-38)/2) __tcase 7. +6 __tcntl 7. +10 __tused 7.;
end;
****printing univariate statistics for matched sets, if requested;
%if &uni=YES | &uni=Y %then %do;
proc means data=__dat;
class &case;
var %var;
format &case x.;
title&t 'Univariate Statistics for Matched Sets Used';
%end;
****creating additional tables of case and control data;
%if not(&tables=) %then %do;
%do i=1 %to &ntab;
%let tabvar=%scan(&tables,&i);
data __tab;
merge __dat __caco; by &setid;
retain __case1 __cntl1;
if first.&setid then do;
__case1=0;
__cntl1=0;
end;
if &case=0 & &tabvar=1 then __cntl1+1;
if &case=1 & &tabvar=1 then __case1+1;
if last.&setid;
__ratio=compbl(put(__ncase,5.) || ' : '
|| put(__ncntl,5.));
proc freq noprint;
tables __ratio*__case1*__cntl1/out=__out3;
proc tabulate format=12.;
class __cntl1 __ratio __case1;
var count;
table __ratio*__case1,__cntl1*count;
keylabel n=' ' sum=' ';
label __ratio='Case-Control Ratio'
__case1='# Cases'
__cntl1='# Controls';
title&t "# Cases vs. # Controls Per Matched Set Where &tabvar=1";
%end;
%end;
****PHREG procedure to fit discrete logistic model to data;
proc phreg data=__dat nosummary
%if &diag=YES | &diag=Y %then %do;
covout outest=__out4
%end;
;
model __time*&case(0)=%var / ties=discrete itprint
%if &cov^=NO & &cov ^=N %then %do;
covb
%end;
maxiter=&maxiter convergelike=&epsilon risklimits;
strata &setid;
%if &diag=YES | &diag=Y %then %do;
****output data set containing diagnostics;
output out=__out5(drop=__time)
dfbeta=__df1-__df&numvars xbeta=xbeta lmax=lmax ld=ld
ressco=__sco1-__sco&numvars resmart=resmart;
title&t 'Results of Modeling';
****standard errors necessary for calculating scaled DFBETAs;
data __out4; set __out4;
if _type_='COV';
drop _ties_ _type_ _name_ _lnlike_;
data _null_; set __out4;
array cov(&numvars) %var;
__name='se' || left(put(_n_,5.));
call symput(__name,sqrt(cov(_n_)));
****create fitted values and covariates centered about
weighted-specific mean, to be used in Hosmer and Lemeshow
diagnostics;
proc sort data=__out5; by &setid &case;
data __xbeta (drop=__df1-__df&numvars __sco1-__sco&numvars
lmax ld);
set __out5 nobs=last;
by &setid &case;
****fitted values xi;
xi=&case-resmart;
****covariates centered about weighted-specific mean;
%do i=1 %to &numvars;
__sch&i=__sco&i/resmart;
%end;
call symput('nobs',trim(left(put(last,6.))));
****matrix manipulations in PROC IML;
proc iml;
****create matrix of covariate vectors centered about
weighted-specific mean;
use __xbeta;
read all var {
%do i=1 %to &numvars;
__sch&i
%end;
} into x;
close __xbeta;
****create covariance matrix;
use __out4;
read all var {
%do i=1 %to &numvars;
&&var&i
%end;
} into h;
****create hat matrix and output diagonal elements to data
set hat;
hc=x*h*x`;
hat=vecdiag(hc);
create __hat var {hat};
append from hat;
close __hat __out4;
quit;
****merge the xbeta data set and the hat data set together
to create delta chi square, delta beta, leverage values,
and fitted values as seen in Hosmer and Lemeshow;
data subdiag (keep=&setid &case xbeta lmax ld xi deltax2 hat infl
%do i=1 %to &numvars;
&&var&i d&&varb&i s&&varb&i
%end;
);
merge __out5 __xbeta __hat;
****leverage values;
hat=hat*xi;
****square of standardized Pearson residual;
deltax2=((&case-xi)/sqrt(xi*(1-hat)))**2;
****influence statistic;
infl=deltax2*hat/(1-hat);
label deltax2='delta chi-square'
infl='overall influence statistic'
lmax='LMAX global influence statistic'
ld='likelihood displacement'
hat='leverage value from hat matrix'
xi='fitted values';
****individual regular and scaled influence statistics;
%do i=1 %to &numvars;
d&&varb&i=__df&i;
s&&varb&i=__df&i/&&se&i;
label d&&varb&i="delta beta for variable &&var&i"
s&&varb&i="scaled delta beta for variable &&var&i";
%end;
****sum above variables over entire stratum;
proc sort data=subdiag; by &setid; run;
data setdiag (drop=__sumr __sumi __suml __sumld __sumh
xbeta xi &case
%do i=1 %to &numvars;
__sumd&i &&var&i s&&varb&i
%end;
);
set subdiag; by &setid;
if first.&setid then do;
__sumr=0; __sumi=0; __suml=0; __sumld=0; __sumh=0;
%do i=1 %to &numvars;
__sumd&i=0;
%end;
end;
__sumr=__sumr+deltax2;
__sumi=__sumi+infl;
__suml=__suml+lmax;
__sumld=__sumld+ld;
__sumh=__sumh+hat;
%do i=1 %to &numvars;
__sumd&i=__sumd&i+d&&varb&i;
%end;
retain __sumr __sumi __suml __sumld __sumh
%do i=1 %to &numvars;
__sumd&i
%end;
;
if last.&setid then do;
deltax2=__sumr;
infl=__sumi;
lmax=__suml;
ld=__sumld;
hat=__sumh;
%do i=1 %to &numvars;
d&&varb&i=__sumd&i;
%end;
output;
label deltax2='sum of delta chi-square'
infl='sum of overall influence statistic'
lmax='sum of LMAX values'
ld='sum of likelihood displacement'
hat='sum of leverage values'
%do i=1 %to &numvars;
d&&varb&i="sum of delta beta for &&var&i"
%end;
;
end;
run;
%end;
%if &id=YES | &id=Y %then %do;
****prints out sets not used in model;
proc print data=__badset;
title&t 'Sets not included in model';
%end;
run;
proc datasets lib=work;
delete __badset __caco __dat __dat1 __out1 __out2 __out3 __out4
__tab __xbeta __hat;
run; quit;
title&t;
%end;
%mend mcstrat;